(U) GENERAL INFORMATION: 



(i) APPLICANT: Choi et. al . 

Li) TITLE OF INVENTION: Streptococcus pneumoniae Antigens and 
Vacciries 

(i\i) NUMBER OF SEQUENCES: 4 





SEQUENCE LISTING 




(iv) 
(A) 
(B) 
(C) 



RRESPONDENCE ADDRESS: 

DRESSEE: Human Genome Sciences, Inc. 
9410 Key West Avenue 
Y: Rockville 

(D) STA^E: Maryland 

(E) COUNTRY: USA 
ZIP: \o850 



(V) COMPUTER 

(A) MEDIUM T^ 

( B ) COMPUTER : 



(C) OPERATING SYSTEM 

(D) SOFTWARE: ASCII Text 



•ABLE FORM: 

E: Diskette, 3.50 inch, 1.4Mb storage 
P Vectra 486/33 

MSDOS version 6.2 



(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/961,083 

(B) FILING DATE:OCT-J0-1997 

(C) CLASSIFICATION: 



(Vii) PRIOR APPLICATION DAT/ 

(A) APPLICATION NUMBER : 6 0 X02 9 , 96 0 

(B) FILING DATE :OCT-3 1-199* 



(viii) ATTORNEY /AGENT INFORMATIC 

(A) NAME: Michelle S. Marks 

(B) REGISTRATION NUMBER: 41,971 

(C) REFERENCE/DOCKET NUMBER: 




(vi) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (301) 309-8504 

(B) TELEFAX: (301) 309-8512 



(2) INFORMATION FOR SEQ ID NO: 1: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2389 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 



TTCTTACGAG TTGGGACTGT ATC AAGC TAG AACGGTTAAGN^yVATAA'fe/GTGTTTCCTA 60 

TAT AG AT GG A AAACAAGCGA CGCAAAAAAC GGAGAATTTG ACTCCTGATG AGGTTAGCAA 120 

GCGTGAAGGA ATCAATGCTG AGCAAATCGT CATCAAGATA ACAGACCAAG GCTATGTCAC 180 

TTCACATGGC GACCACTATC ATTATTACAA TGGTAAGGTT CCTTATGACG CTATCATCAG 240 

TGAAGAATTA CTCATGAAAG ATCCAAACTA TAAGCTAAAA GATGAGGATA TTGTTAATGA 300 

GGTCAAGGGT GGATATGTTA TCAAGGTAGA TGGAAAATAC TATGTTTACC TTAAGGATGC 360 

TGCCCACGCG GATAACGTCC GTACAAAAGA GGAAATCAAT C G AC AAAAAC AAGAGCATAG 4 20 

TCAACATCGT GAAGGTGGAA CTCCAAGAAA CGATGGTGCT GTTGCCTTGG CACGTTCGCA 4 80 

AGGACGCTAT AC T AC AG ATG ATGGTTATAT CTTTAATGCT TCTGATATCA TAGAGGATAC 54 0 

TGGTGATGCT TATATCGTTC CTCATGGAGA TCATTACCAT TACATTCCTA AGAATGAGTT 600 

ATCAGCTAGC GAGTTGGCTG CTGCAGAAGC CTTCCTATCT GGTCGAGGAA ATCTGTCAAA 660 

TTCAAGAACC TATCGCCGAC AAAATAGCGA TAACACTTCA AGAACAAACT GGGTACCTTC 720 

TGTAAGCAAT CCAGGAACTA CAAATACTAA CACAAGCAAC AACAGCAACA CTAACAGTCA 780 

AGCAAGTCAA AGTAATGACA TTGATAGTCT CTTGAAACAG CTCTACAAAC TGCCTTTGAG 840 

TCAACGACAT GTAGAATCTG ATGGCCTTGT CTTTGATCCA GCACAAATCA CAAGTCGAAC 9 00 

AGC TAGAGGT GTTGCAGTGC C AC AC GG AG A TCATTACCAC TTCATCCCTT ACTCTCAAAT 960 

GTCTGAATTG GAAGAACGAA TCGCTCGTAT TATTCCCCTT CGTTATCGTT CAAACCATTG 1020 

GGTACCAGAT TCAAGGCCAG AACAACCAAG TCCACAACCG ACTCCGGAAC CTAGTCCAGG 10 80 

CCCGCAACCT GCACCAAATC T T AAAAT AG A CTCAAATTCT TCTTTGGTTA GTCAGCTGGT 1140 

ACGAAAAGTT GGGGAAGGAT ATGTATTCGA AGAAAAGGGC ATCTCTCGTT ATGTCTTTGC 1200 

GAAAGATTTA CCATCTGAAA CTGTTAAAAA TCTTGAAAGC AAGTTATCAA AACAAGAGAG 1260 

TGTTTCACAC ACTTTAACTG CTAAAAAAGA AAATGTTGCT CCTCGTGACC AAGAATTTTA 1320 

TGATAAAGCA TATAATCTGT TAACTGAGGC TCATAAAGCC TTGTTTGNAA ATAAGGGTCG 1380 

TAATTCTGAT TTCCAAGCCT TAGACAAATT ATTAGAACGC TTGAATGATG AATCGACTAA 14 4 0 

TAAAGAAAAA TTGGTAGATG ATTTATTGGC ATTCCTAGCA CCAATTACCC ATCCAGAGCG 15 00 

ACTTGGCAAA CCAAATTCTC AAATTGAGTA TACTGAAGAC GAAGTTCGTA TTGCTCAATT 1560 

AGCTGATAAG TATACAACGT CAGATGGTTA CATTTTTGAT GAACATGATA TAATCAGTGA 1620 

TGAAGGAGAT GCATATGTAA CGCCTCATAT GGGCCATAGT CACTGGATTG GAAAAGATAG 1680 

CCTTTCTGAT AAGGAAAAAG TTGCAGCTCA AGC C TAT AC T AAAGAAAAAG GTATCCTACC 174 0 

TCCATCTCCA GACGCAGATG TTAAAGCAAA TCCAACTGGA GATAGTGCAG CAGCTATTTA 1800 

CAATCGTGTG AAAGGGGAAA AACGAATTCC ACTCGTTCGA CTTCCATATA TGGTTGAGCA 1860 

TACAGTTGAG GTTAAAAACG GTAATTTGAT TATTCCTCAT AAGGATCATT ACCATAATAT 1920 
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TAAATTTGCT TGGTTTGATG ATCACACATA CAAAGCTCCA AATGGC TATA CCTTGGAAGA 19 80 

TTTGTTTGCG ACGATTAAGT ACTACGTAGA ACACCCTGAC GAACGTCCAC ATTCTAATGA 2040 

TGGATGGGGC AATGCCAGTG AGCATGTGTT AGGCAAGAAA GACCACAGTG AAGATCCAAA 2100 

TAAGAACTTC AAAGCGGATG AAGAGCCAGT AGAGGAAACA CCTGCTGAGC CAGAAGTCCC 2160 

TCAAGTAGAG AC TGAAAAAG TAGAAGCCCA AC TC AAAGAA GCAGAAGTTT TGCTTGCGAA 2 220 

AGTAACGGAT TCTAGTCTGA AAGCCAATGC AACAGAAACT CTAGCTGGTT TACGAAATAA 22 80 

TTTGACTCTT CAAATTATGG ATAACAATAG TATCATGGCA GAAGCAGAAA AATTACTTGC 2340 

GTTGTTAAAA GGAAGTAATC CTTCATCTGT AAGTAAGGAA AAAATAAAC 2 389 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 796 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Ser Tyr Glu Leu Gly Leu Tyr Gin Ala Arg Thr Val Lys Glu Asn Asn 
15 10 15 

Arg Val Ser Tyr lie Asp Gly Lys Gin Ala Thr Gin Lys Thr Glu Asn 
20 25 30 

Leu Thr Pro Asp Glu Val Ser Lys Arg Glu Gly lie Asn Ala Glu Gin 
35 40 45 

lie Val lie Lys lie Thr Asp Gin Gly Tyr Val Thr Ser His Gly Asp 
50 55 60 

His Tyr His Tyr Tyr Asn Gly Lys Val Pro Tyr Asp Ala lie lie Ser 
65 70 75 80 

Glu Glu Leu Leu Met Lys Asp Pro Asn Tyr Lys Leu Lys Asp Glu Asp 
85 90 95 

lie Val. Asn Glu Val Lys Gly Gly Tyr Val lie Lys Val Asp Gly Lys 
100 105 110 

Tyr Tyr Val Tyr Leu Lys Asp Ala Ala His Ala Asp Asn Val Arg Thr 
115 120 125 

Lys Glu Glu lie Asn Arg Gin Lys Gin Glu His Ser Gin His Arg Glu 
130 135 140 

Gly Gly Thr Pro Arg Asn Asp Gly Ala Val Ala Leu Ala Arg Ser Gin 
145 150 155 160 

Gly Arg Tyr Thr Thr Asp Asp Gly Tyr lie Phe Asn Ala Ser Asp lie 
165 170 175 



lie Glu Asp Thr Gly Asp Ala Tyr lie Val Pro His Gly Asp His Tyr 
180 185 190 



His Tyr lie Pro 
195 

Glu Ala Phe Leu 
210 

Arg Arg Gin Asn 
225 

Val Ser Asn Pro 



Thr Asn Ser Gin 
260 

Gin Leu Tyr Lys 
275 

Leu Val Phe Asp 
290 

Ala Val Pro His 
305 

Ser Glu Leu Glu 



Ser Asn His Trp 
340 

Pro Thr Pro Glu 
355 

lie Asp Ser Asn 
370 

Glu Gly Tyr Val 
385 

Lys Asp Leu Pro 



Lys Gin Glu Ser 
420 

Ala Pro Arg Asp 
435 

Glu Ala His Lys 
450 

Gin Ala Leu Asp 
465 

Lys Glu Lys Leu 



His Pro Glu Arg 
500 



Lys Asn Glu Leu 
200 

Ser Gly Arg Gly 
215 

Ser Asp Asn Thr 
230 

Gly Thr Thr Asn 
245 

Ala Ser Gin Ser 



Leu Pro Leu Ser 
280 

Pro Ala Gin lie 
295 

Gly Asp His Tyr 
310 

Glu Arg lie Ala 
325 

Val Pro Asp Ser 



Pro Ser Pro Gly 
360 

Ser Ser Leu Val 
375 

Phe Glu Glu Lys 
390 

Ser Glu Thr Val 
405 

Val Ser His Thr 



Gin Glu Phe Tyr 
440 

Ala Leu Phe Xaa 
455 

Lys Leu Leu Glu 
470 

Val Asp Asp Leu 
485 

Leu Gly Lys Pro 



Ser Ala Ser Glu 



Asn Leu Ser Asn 
220 

Ser Arg Thr Asn 
235 

Thr Asn Thr Ser 
250 

Asn Asp lie Asp 
265 

Gin Arg His Val 



Thr Ser Arg Thr 
300 

His Phe lie Pro 
315 

Arg lie lie Pro 
330 

Arg Pro Glu Gin 
345 

Pro Gin Pro Ala 



Ser Gin Leu Val 
380 

Gly lie Ser Arg 
395 

Lys Asn Leu Glu 
410 

Leu Thr Ala Lys 
425 

Asp Lys Ala Tyr 



Asn Lys Gly Arg 
460 

Arg Leu Asn Asp 
475 

Leu Ala Phe Leu 
490 

Asn Ser Gin lie 
505 



Leu Ala Ala Ala 
205 

Ser Arg Thr Tyr 



Trp Val Pro Ser 
240 

Asn Asn Ser Asn 
255 

Ser Leu Leu Lys 
270 

Glu Ser Asp Gly 
285 

Ala Arg Gly Val 



Tyr Ser Gin Met 
320 

Leu Arg Tyr Arg 
335 

Pro Ser Pro Gin 
350 

Pro Asn Leu Lys 
365 

Arg Lys Val Gly 



Tyr Val Phe Ala 
400 

Ser Lys Leu Ser 
415 

Lys Glu Asn Val 
430 

Asn Leu Leu Thr 
445 

Asn Ser Asp Phe 



Glu Ser Thr Asn 
480 

Ala Pro lie Thr 
495 

Glu Tyr Thr Glu 
510 
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Asp Glu Val Arg lie Ala Gin Leu Ala Asp Lys Tyr Thr Thr Ser Asp 
515 520 525 

Gly Tyr lie Phe Asp Glu His Asp lie lie Ser Asp Glu Gly Asp Ala 
530 535 540 

Tyr Val Thr Pro His Met Gly His Ser His Trp lie Gly Lys Asp Ser 
545 550 555 560 

Leu Ser Asp Lys Glu Lys Val Ala Ala Gin Ala Tyr Thr Lys Glu Lys 
565 570 575 

Gly lie Leu Pro Pro Ser Pro Asp Ala Asp Val Lys Ala Asn Pro Thr 
580 585 590 

Gly Asp Ser Ala Ala Ala lie Tyr Asn Arg Val Lys Gly Glu Lys Arg 
595 600 605 

lie Pro Leu Val Arg Leu Pro Tyr Met Val Glu His Thr Val Glu Val 
610 615 620 

Lys Asn Gly Asn Leu lie lie Pro His Lys Asp His Tyr His Asn lie 
625 630 635 640 

Lys Phe Ala Trp Phe Asp Asp His Thr Tyr Lys Ala Pro Asn Gly Tyr 
645 650 655 

Thr Leu Glu Asp Leu Phe Ala Thr lie Lys Tyr Tyr Val Glu His Pro 
660 665 670 

Asp Glu Arg Pro His Ser Asn Asp Gly Trp Gly Asn Ala Ser Glu His 
675 680 685 

Val Leu Gly Lys Lys Asp His Ser Glu Asp Pro Asn Lys Asn Phe Lys 
690 695 700 

Ala Asp Glu Glu Pro Val Glu Glu Thr Pro Ala Glu Pro Glu Val Pro 
705 710 715 720 

Gin Val Glu Thr Glu Lys Val Glu Ala Gin Leu Lys Glu Ala Glu Val 
725 730 735 

Leu Leu Ala Lys Val Thr Asp Ser Ser Leu Lys Ala Asn Ala Thr Glu 
740 745 750 

Thr Leu Ala Gly Leu Arg Asn Asn Leu Thr Leu Gin lie Met Asp Asn 
755 760 765 

Asn Ser lie Met Ala Glu Ala Glu Lys Leu Leu Ala Leu Leu Lys Gly 
770 775 780 

Ser Asn Pro Ser Ser Val Ser Lys Glu Lys lie Asn 
785 790 795 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3 
AGTCGGATCC TTCTTACGAG TTGGGACTGT ATCAAGC 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4 
AGTCAAGCTT GTTTATTTTT TCCTTACTTA CAGATGAAGG 



